Introduction

“Provide background on your data sets and a clear formulated question or hypothesis.”

For this project, my question of Interest is: “How does crime rate relate to income in Canada?”

In order to answer this question, data on both crime and socioeconomic status are needed. However, I found no existing data set that contains all desired information, therefore this needs to be achieved through merging more than one data sets. Aftering choosing carefully, the following two separate data sets are obtained:

  1. “Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas”. Released 2023-05-02. This data set is annually updated and maintained by Statistics Canada (Table 11-10-0239-01). Data is collected through the Survey of Labor and Income Dynamics, Survey of Consumer Finances, and Canadian Income Survey.

  2. “Incident-based crime statistics, by detailed violations, Canada, provinces, territories, Census Metropolitan Areas and Canadian Forces Military Police”. Released 2023-07-27. This data set is also annually updated and maintained by Statistics Canada (Table 35-10-0177-01, formerly CANSIM 252-0051). Data is collected through the Uniform Crime Reporting Survey.

Understanding the relationship between crime rates and income in Canada is crucial for policymakers, law enforcement agencies, and social welfare programs. Exploring this correlation can shed light on the socioeconomic factors driving criminal behavior and help formulate targeted interventions to alleviate poverty and reduce crime. Additionally, elucidating this connection can inform broader discussions on social inequality, justice, and community well-being in Canadian society.

Methods

“Include how and where the data were acquired, how you cleaned and wrangled the data, what tools you used for data exploration.”

Both data sets are downloaded directly from Statistics Canada, which is usually considered to be an reliable source. Because they share the same source, the data sets follows similar structure and all contains the two columns GEO and REF_DATE where the former one refers to the geographical region and the second one refers to the year of data. Thus, it’s possible to combine the two data sets to obtain all information needed.

However, it is worth mentioning that both data sets are huge and contains unrelated information. Therefore, cleaning and wrangling are needed for more convenient analysis and more efficient computing & uploading, as the original data sets are oversize thus cannot be pushed to github repository.

Reference:

  1. The census data set: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1110023901
  2. The crime data set: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3510017701
fn1 <- "https://raw.githubusercontent.com/inorrr/JSC370_project/main/census.csv"
fn2 <- "https://raw.githubusercontent.com/inorrr/JSC370_project/main/crime.csv"

if (!file.exists("census.csv"))
  download.file(fn1, destfile = "census.csv")
census_df <- data.table::fread("census.csv")

if (!file.exists("crime.csv"))
  download.file(fn2, destfile = "crime.csv")
crime_df <- data.table::fread("crime.csv")

Data Wrangling and Cleaning

  1. First I dropped the unrelated columns in both columns, keeping only information revelant to the question of interest.
crime_df <- crime_df[, c("REF_DATE", "GEO", "Violations", "Statistics", "VALUE", "UOM")]
census_df <- census_df[, c("REF_DATE", "GEO", "Age group", "Sex", "Income source", "Statistics", "VALUE", "UOM", "SCALAR_FACTOR")]
  1. I noticed that in the GEO column:
  • both data sets uses a mix of province & city names. I removed all observations with city names and keep only those province data so that the data set is smaller(since the province data contains cities already) and avoids redundancy.
  • Notice that the census data only have data from the 10 provinces, not the 3 territories, I filtered out those data from the crime data set as there are no matching data.
  • Notice that there is a square bracket with a number behind each province name in the crime data, therefore I removed it with regular expression so that we can join the two data sets on province later.
table(census_df$GEO)

provinces1 <- c("Alberta [48]", "British Columbia [59]", "Manitoba [46]", "New Brunswick [13]", 
               "Newfoundland and Labrador [10]", "Saskatchewan [47]",
               "Nova Scotia [12]", "Ontario [35]", 
               "Prince Edward Island [11]", "Quebec [24]")

provinces2 <- c("Alberta", "British Columbia", "Manitoba", "New Brunswick", 
               "Newfoundland and Labrador", "Saskatchewan","Nova Scotia", 
               "Ontario", "Prince Edward Island", "Quebec")

crime_df <- crime_df[crime_df$GEO %in% provinces1, ]
census_df <- census_df[census_df$GEO %in% provinces2, ]

crime_df$GEO <- gsub("\\s*\\[\\d+\\]$", "", crime_df$GEO)
table(crime_df$GEO)
  1. In the Statistics column of the crime data, there are many measures related to crime, but I’m only interested in the number of incidents (Actual incidents) and crime rate (Rate per 100,000 population), thus statistics related to charges are removed. The Crime Severity Index(Percentage contribution to the Crime Severity Index (CSI)) seems to be interesting and thus is kept.
crime_df <- crime_df %>% filter(Statistics == "Actual incidents" | 
                                Statistics == "Rate per 100,000 population" | 
                                Statistics == "Percentage contribution to the Crime Severity Index (CSI)")
  1. In the crime data frame,
  • there are 314 different types of crime, which does much more detailed categorization than I’m interested in. Therefore, to avoid having too much computation to merge data later on, I choose too keep only the bigger categories (i.e. total robbery, total assaults, etc).
  • There are also square brackets at the end so I removed them.
print(length(unique(crime_df$Violations)))
table(crime_df$Violations)

# Identify rows that start with "Total"
total_rows <- grepl("^Total", crime_df$Violations)

# Subset the dataframe to keep only the rows starting with "Total"
crime_df <- crime_df[total_rows, , drop = FALSE]

# Remove square brackets and numbers at the end
crime_df$Violations <- gsub("\\s*\\[\\d+\\]$", "", crime_df$Violations)
  1. In the census data frame:
  • the column Age group specifies the age however since we do not have this information in the crime data frame, we need to combine all age groups. This can be done by taking the average of the categories.

  • Same for Sex, same method is used.

table(census_df$"Age group")
table(census_df$"Sex")

# first we merge the age group categories
census_df <- census_df %>% 
  group_by(REF_DATE, GEO, Sex, `Income source`, Statistics, UOM, SCALAR_FACTOR) %>% 
  summarise(VALUE = mean(VALUE, na.rm = TRUE))

# next we merge the age group categories
census_df <- census_df_new %>% 
  group_by(REF_DATE, GEO, `Income source`, Statistics, UOM, SCALAR_FACTOR) %>% 
  summarise(VALUE = mean(VALUE, na.rm = TRUE))
  1. I keep only the data between 1998 and 2021, as that’s the year range where the two data sets overlap.
crime_df <- crime_df %>% filter(REF_DATE >= 1998 & REF_DATE <= 2021)
census_df <- census_df %>% filter(REF_DATE >= 1998 & REF_DATE <= 2021)
  1. Write the cleaned data frame to CSV files.
write.csv(crime_df, "/Users/yinuozhao/Desktop/UofT/JSC370/JSC370-2024-main/JSC370_project/crime.csv")
write.csv(census_df, "/Users/yinuozhao/Desktop/UofT/JSC370/JSC370-2024-main/JSC370_project/census.csv")

At this point both the crime data frame and census data frame has REF_DATE and GEO in common, and they each have another categorical variable, which is Income source for census data and Violation(it means crime type) for crime data. While it may seem to make sense to join the two data sets using REF_DATE and GEO directly, the results would involves the data for all combinations of Income source and Violation for each REF_DATE and GEO. This will be a huge data set and thus slow down the computation. Therefore, I choose to keep the data sets separate and join them when necessary (i.e. after picking out certain categories of interest).

Exploratory Data Analysis

Notice that right now both data sets are in long format, I converted them to wide for convenience.

crime_df <- pivot_wider(crime_df, id_cols = c(REF_DATE, GEO, Violations), 
                        names_from = Statistics, values_from = VALUE)
crime_df <- na.omit(crime_df)

census_df <- pivot_wider(census_df, id_cols = c(REF_DATE, GEO, `Income source`), 
                         names_from = Statistics, values_from = VALUE)
census_df <- na.omit(census_df)

Check the dimensions and headers and footers of the data

dim(census_df)
dim(crime_df)
head(crime_df)
tail(crime_df)
head(census_df)
tail(census_df)

The census data set has 8 variables with 3613 observations, the crime data set has 6 variables with 9271 observations. By looking at the headers and footers of both data sets, they seems to be imported correctly and contains no missing values (in the displayed rows).

Check the variable types in the data

str(census_df)
str(crime_df)
summary(census_df)
summary(crime_df)

In both data frames, we see that the variable types are a mix of integer, numeric and characters. All variable types correctly align with the context of the variables. No major problems arises with the data at this stage (i.e. a variable with all missing values.)

Take a closer look at some/all of the variables

For both data frame, we need REF_DATE and GEO to correctly identify a province in Canada with a valid year. For census data frame, we need to look at the values of the different types of income (median, aggregate, etc). For the crime data frame, we need to look at the recorded crime rate and actual number of incidents to be within the reasonable range.

table(census_df$REF_DATE)
table(census_df$GEO)
table(crime_df$REF_DATE)
table(crime_df$GEO)
summary(census_df$`Aggregate income`)
summary(census_df$`Average income (excluding zeros)`)
summary(census_df$`Median income (excluding zeros)`)
summary(crime_df$`Actual incidents`)
summary(crime_df$`Rate per 100,000 population`)

Both data sets contains data from 1998 to 2021, on the 10 provinces in Canada as desired because I cleaned the data sets this way. Other variables being checked are within the reasonable range. The aggregate income, average income and median income are all measured in 2021 constant dollars, aggregate income record numbers in millions. The crime rates are measured as number of incidents per 100,000 population.

Validate with an external source

Notice that the minimum average income is 677.8, which seems to be much lower than then mean average income, even 10 times lower than the 1st quantile. Since it seems quite suspicious, we need to validate it.

census_df[which.min(census_df$`Average income (excluding zeros)`), ]

This data is from Prince Edward Island in 2004, and the income source is “other government transfers”. Upon research, Government transfers refers to assistance from provincial and municipal programs, Workers’ Compensation benefits, the GST/HST Credit and provincial refundable tax credits such as the Quebec and Newfoundland and Labrador sales tax credits. However, since many of the above mentioned are made to their own category and excluded from “other government transfers” in the data set, it make sense that the value is low.

Preliminary Results

“Provide summary statistics in tabular from and publication-quality figures, take a look at the kable function from knitr to write nice tables in Rmarkdown.”

Looking at Crime Data

Firstly, I examined the trend in crime rate across provinces and crime types.

filtered_data = crime_df %>% filter(Violations=="Total, all violations")
unique_x_values <- unique(crime_df$REF_DATE)
ggplot(filtered_data, aes(x = REF_DATE, y = `Rate per 100,000 population`, color = GEO)) +
  geom_line() +
  labs(x = "Year", y = "Rate per 100,000 population", title = "Rates of Total Crime by Province") +
  scale_x_continuous(breaks = unique_x_values) + 
  scale_color_discrete(name = "Provinces") + 
  theme_linedraw() +
  theme(legend.position = "right") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  theme(plot.title = element_text(face = "bold")) + 
  theme(plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")) + 
  theme(axis.text = element_text(size = 7)) + 
  theme(legend.title = element_text(face = "bold"))

The plot depicts the total crime rate, encompassing all types of crimes, from 1998 to 2021, with each province represented by a distinct color. The crime rate is measured per 100,000 population. Notably, Saskatchewan consistently exhibits a significantly higher crime rate compared to other provinces throughout the period of 1998 to 2021. Conversely, Quebec and Ontario consistently demonstrate the lowest crime rates. Across all provinces, there is a discernible decreasing trend in crime rates over the years, with many provinces experiencing peak crime rates in 2003-2004.

In addition to analyzing total crime, I delved into specific crime categories often associated with poverty: break and enter, robbery, and prostitution. The three accompanying plots illustrate their respective rates over the years. Overall, there is a decreasing trend in the rates of all three crimes, with occasional exceptions such as robbery rates in Manitoba. Notably, British Columbia stands out with a significantly high rate of prostitution in 2004, doubling the number reported in Saskatchewan, which held the second-highest rate that year.

Note that the 3 plots below shares the same legend with the above plots. Therefore the legend is omited for better display purpose. Codes are also not shown as they reuses the above code chunk.

Presented below is a table summarizing the average crime rate across all crime types (excluding Total, all violations) categorized by province and year. The cells are color-coded by value, with lighter shades indicating higher values and darker shades representing smaller values. Upon scrolling through the table, it becomes evident that across all provinces, there is a discernible decreasing trend in the average crime rate, as evidenced by the darkening shades in each column. This observation aligns with the findings and inferences drawn from the preceding plots.

Table of Average Crime Rate Across All Types
Year Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Nova Scotia Ontario Prince Edward Island Quebec Saskatchewan
1998 990.7389 1310.2795 1140.4897 746.1692 644.1653 875.0903 776.1408 686.4670 702.7424 1420.691
1999 996.9833 1243.8727 1139.4081 751.8170 616.3186 890.2262 705.8405 786.3608 649.5519 1372.294
2000 938.1432 1202.5849 1146.4238 719.6546 621.9546 807.6322 695.1511 739.1711 653.3770 1428.722
2001 979.9516 1233.4349 1213.3189 712.1708 617.2849 812.8562 672.0454 752.6111 634.2719 1518.370
2002 994.3392 1245.2322 1202.5438 726.1322 639.0897 816.2289 651.8003 850.6072 617.0995 1505.492
2003 1065.0392 1317.9670 1366.9247 758.0414 657.4530 893.6070 632.2778 915.8641 638.6003 1728.543
2004 1098.5319 1315.9251 1347.9319 780.0538 667.5708 915.0119 595.3678 893.0314 624.6361 1684.091
2005 1079.1975 1260.0459 1278.7128 692.0257 650.8968 859.9862 574.3292 812.8373 615.6142 1649.431
2006 1007.2778 1203.3584 1260.0861 648.6084 667.1994 856.8851 589.4051 737.7965 609.9108 1502.769
2007 1018.9281 1137.0500 1226.7060 608.8173 697.5675 803.5808 560.8689 670.0778 584.9914 1480.692
2008 984.9581 1046.3411 1081.2483 624.8186 696.9058 772.9158 537.7905 683.8757 589.6462 1412.581
2009 935.6630 983.8919 1153.1356 615.0719 716.9997 773.6461 521.8932 697.9243 578.2262 1403.483
2010 884.1551 933.4835 1043.1349 625.0964 725.7876 763.8995 496.5708 699.9311 562.8811 1432.178
2011 812.9859 880.2127 1012.0920 577.9611 705.3294 714.0051 468.0995 710.5243 519.5841 1408.674
2012 819.5778 856.2392 1000.7560 615.4689 696.8877 714.1444 449.8354 721.6681 510.7627 1311.701
2013 778.9997 803.0581 866.5694 538.5750 682.2809 639.5953 408.5759 669.5086 460.8854 1218.580
2014 783.3216 792.6345 840.0633 482.3608 636.2366 587.6563 391.6511 510.4745 421.5565 1155.519
2015 855.7324 827.4308 894.7058 527.3759 635.6389 551.9057 391.9432 451.0126 399.5150 1258.409
2016 869.6887 818.0165 956.7842 505.9946 630.6538 538.1619 399.0192 468.5732 405.6246 1317.321
2017 927.8811 781.4886 955.9997 549.3957 583.0170 549.4824 417.5035 439.5063 410.9189 1271.176
2018 886.8923 718.8339 912.0272 497.1263 502.9756 493.3066 400.3605 447.9967 339.5035 1188.911
2019 925.4612 796.9984 1016.7941 544.1014 553.7074 498.5129 404.5249 518.0077 326.3725 1151.130
2020 788.6307 735.6162 943.0979 564.7014 557.8670 491.2514 356.3917 446.2160 301.0057 1085.383
2021 730.3198 703.6836 839.5900 590.1656 602.4644 489.7486 360.7433 418.9484 312.5548 1106.370

Looking at Income Data

Given the general decrease in crime rates, I am interested in exploring the trend of income to ascertain the potential existence of an association.

The following plot illustrates the average total income of each province over time. It is evident that, on the whole, the average total income for all provinces exhibits a steady upward trend. Notably, since 2003, Alberta has surpassed Ontario to become the province with the highest average total income. Additionally, it is noteworthy to observe a slight decrease in income across all provinces around 2019, likely attributed to the impact of the COVID-19 pandemic.

The box plot below also shows the average total income for different years, but this time combining data from all provinces. The pink dots represent the actual values for each year in each province. It’s clear from the boxes that the average income is increasing year by year. Notably, between 2010 and 2016, there are some outliers with very high incomes, which corresponds to Alberta when compared with the previous plot.

After examining total income, I delved into specific income sources: employment income, investment income, and market income, which are major income categories. The plots below illustrate that all three types of income are increasing. However, employment and market income show a more steady growth pattern, while investment income fluctuates dramatically from year to year. (Same as before, the same legend is omitted for display purpose since I’m only interested in the overall trend, not how provinces compare to each other.)

The table below provides a summary of the average total income of all provinces from 1998 to 2021. The color scale used is consistent with that of the crime data frame, where lighter shades denote higher values. Over time, there is a discernible increase in income, as evidenced by the progressive lightening of colors.

Table of Average Total Income
Year Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Nova Scotia Ontario Prince Edward Island Quebec Saskatchewan
1998 44804.17 40441.67 38208.33 34125.00 30316.67 34320.83 45100.00 32633.33 37412.50 37370.83
1999 44237.50 41416.67 38075.00 35112.50 31783.33 36316.67 46620.83 33058.33 38520.83 38000.00
2000 45795.83 41462.50 38841.67 35654.17 32533.33 37070.83 48170.83 34225.00 39716.67 38579.17
2001 47845.83 41916.67 39650.00 36537.50 32600.00 37950.00 48279.17 34445.83 40520.83 40308.33
2002 46983.33 42695.83 39629.17 35870.83 33270.83 38487.50 48091.67 35020.83 40791.67 40150.00
2003 47920.83 41787.50 40129.17 36108.33 33125.00 37716.67 47812.50 35325.00 40570.83 40554.17
2004 49666.67 42941.67 41029.17 36433.33 33470.83 38187.50 48170.83 36070.83 41600.00 40345.83
2005 51758.33 43800.00 41700.00 36400.00 35012.50 39283.33 48395.83 36800.00 40804.17 42166.67
2006 53820.83 44679.17 42562.50 37479.17 36858.33 40408.33 47258.33 38000.00 41604.17 44904.17
2007 56879.17 45850.00 44633.33 39200.00 39304.17 41320.83 48079.17 38120.83 42441.67 47750.00
2008 57337.50 46858.33 45875.00 39520.83 40412.50 40933.33 49212.50 39500.00 41925.00 49100.00
2009 57237.50 45875.00 45437.50 40466.67 40500.00 42362.50 48404.17 39587.50 42545.83 50650.00
2010 57650.00 45400.00 45129.17 40470.83 42195.83 41579.17 48891.67 39841.67 42662.50 50337.50
2011 59145.83 45633.33 44687.50 41508.33 44045.83 42516.67 48133.33 41354.17 43833.33 52483.33
2012 62166.67 46400.00 45195.83 41458.33 46112.50 43358.33 48645.83 40562.50 44308.33 52687.50
2013 61962.50 47854.17 46837.50 42095.83 48583.33 45220.83 49600.00 42529.17 44716.67 53658.33
2014 63200.00 47929.17 47129.17 42545.83 49654.17 45504.17 50062.50 42658.33 45041.67 55975.00
2015 64020.83 47437.50 48054.17 42104.17 50033.33 45445.83 51083.33 43308.33 44354.17 54908.33
2016 57637.50 47354.17 47425.00 43383.33 48466.67 45908.33 51191.67 43366.67 45983.33 52816.67
2017 59825.00 50650.00 49529.17 44858.33 48633.33 45612.50 52683.33 44245.83 46412.50 53900.00
2018 59458.33 51012.50 48875.00 46133.33 49154.17 46508.33 52650.00 44962.50 47091.67 52412.50
2019 58787.50 51791.67 48016.67 45779.17 48695.83 46437.50 52162.50 44616.67 48854.17 51304.17
2020 57991.67 54508.33 50758.33 48325.00 50183.33 48558.33 55320.83 47558.33 51000.00 53500.00
2021 59270.83 55733.33 50300.00 48358.33 51308.33 49062.50 56441.67 47737.50 52195.83 53025.00

Examining Crime and Income together

After separately exploring the two data sets, there appears to be a potential relationship between crime rate and income. However, further experimentation using both data sets together is necessary to confirm and understand this relationship more comprehensively.

filtered_census <- census_df %>% filter(`Income source` == "Total income") 
filtered_crime <- crime_df %>%filter(Violations=="Total, all violations")

joint_df <- inner_join(filtered_crime, filtered_census, by = c("REF_DATE", "GEO"))

ggplot(data = joint_df, aes(x = `Average income (excluding zeros)`, y = `Rate per 100,000 population`, color = GEO, size = 3)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = FALSE, size = 1.0) + 
  guides(size = FALSE) + # remove "size" from legend
  labs(x = "Average Total Income (excluding zeros)", y = "Total Crime Rate per 100,000 population`", title = "Average Total Income and Total Crime Rate") +
  theme_linedraw() + 
  scale_color_discrete(name = "Provinces") +
  theme(axis.text = element_text(size = 7)) +
  theme(plot.title = element_text(face = "bold")) +
  theme(legend.title = element_text(face = "bold")) +
  theme(plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"))

The scatter plot presented demonstrates the relationship between average total income and total crime rate (ignoring crime type). Although the overall distribution of points may not reveal a strong relationship, upon coloring the points by province, a distinct pattern emerges. It becomes evident that average total income and total crime rate are negatively correlated across all provinces, with possibly the exception of Newfoundland and Labrador where the relationship appears to be less pronounced, indicated by a line that is nearly horizontal. To draw a confident conclusion, it is imperative to examine the actual correlation value between these variables.

Correlation between Average Total Income and Total Crime Rate
Province Correlation
Alberta -0.7023852
British Columbia -0.8154907
Manitoba -0.7392303
New Brunswick -0.5966958
Newfoundland and Labrador -0.0025743
Nova Scotia -0.9236636
Ontario -0.7583329
Prince Edward Island -0.8082244
Quebec -0.9447648
Saskatchewan -0.7695228

From here, we can see that all correlation values are less than zero, which includes Newfoundland and Labrador which has a very week but negative correlation. Quebec exhibits the strongest correlation between average total income and total crime rate as the correlation is close to -1.

The three smaller plots investigate the relationship between a specific income source and a particular type of violation, as indicated by the legend. In the first plot, which examines the relationship between robbery crime rate and employment income, all provinces display a negative trend except for Newfoundland and Labrador. Manitoba demonstrates a weak relationship, as evidenced by the considerable dispersion of points around the line.

Moving to the second plot, which contrasts market income with property crime rate, a strong and negative relationship is apparent. The third plot, depicting employment income versus prostitution crime rate, reveals varying degrees of association across provinces, with many showing a weak relationship. Notably, Ontario exhibits a positive relationship between employment income and prostitution crime rate.

Pick a Province to Look Closer

Since Ontario demonstrates a different relationship to other provinces in the previous plot, I choose it to take a closer look. The bar plot below shows the composition of crime incidents each year in Ontario. We see that the major categories are property crime and weapon violations.

Table of Average Total Income
Violations Canada Pension Plan (CPP) and Quebec Pension Plan (QPP) benefits Child benefits Employment Insurance (EI) benefits Employment income Government transfers Investment income Market income Old Age Security (OAS) and Guaranteed Income Supplement (GIS) Other government transfers Other income Retirement income Self-employment income Social assistance Total income Wages, salaries and commissions
Total Cannabis Act 0.6399496 0.1543812 0.1797670 0.0753819 0.4470022 0.8532210 0.1175474 0.5881777 0.4757809 0.3878745 -0.3024679 0.7318504 -0.0331746 0.3492095 0.0793002
Total Criminal Code traffic violations -0.7186518 -0.7033720 -0.3111971 -0.8256471 -0.6259757 -0.7113270 -0.8392439 -0.7061295 -0.4641019 -0.6126601 -0.7662505 0.4119972 -0.2573561 -0.8091403 -0.8124589
Total Federal Statute violations -0.6119287 -0.7838763 -0.3673042 -0.7543371 -0.7932252 -0.7381556 -0.7753723 -0.6782687 -0.5159253 -0.6914845 -0.6938453 0.5508116 -0.4218109 -0.8030217 -0.7323180
Total Immigration and Refugee Protection Act -0.8317978 -0.7728983 -0.4672506 -0.6495341 -0.6644657 -0.7653287 -0.7071820 -0.8112917 -0.6886685 -0.4842990 -0.7879141 0.7048638 -0.7783455 -0.7171724 -0.6634925
Total abduction -0.8040213 -0.7333317 -0.4912742 -0.8052992 -0.6300930 -0.6951519 -0.8119927 -0.7372497 -0.5562309 -0.6732663 -0.7985790 0.4697212 -0.4617595 -0.7891757 -0.8045523
Total administration of justice violations 0.2894308 0.5634451 0.0545464 0.5244904 0.3645871 0.3851740 0.5142525 0.4384029 0.2325288 0.5417441 0.3120033 -0.3465106 0.2938743 0.4909069 0.5033958
Total assaults against a peace officer 0.5502388 0.6390806 0.5288927 0.6454176 0.6027397 0.4865436 0.6319311 0.5931247 0.7434956 0.6854612 0.4739395 -0.2753145 0.2414854 0.6431781 0.6085258
Total breaking and entering -0.9225065 -0.8356442 -0.4924572 -0.8626247 -0.7424577 -0.8514282 -0.8942464 -0.8359850 -0.6090103 -0.6813438 -0.9195154 0.6695426 -0.6051515 -0.8816448 -0.8744406
Total cannabis, trafficking, production or distribution (pre-legalization) -0.8042443 -0.7928160 -0.2177518 -0.8090668 -0.7933560 -0.7772350 -0.8522811 -0.7282460 -0.2614858 -0.5575754 -0.8674750 0.5659155 -0.6541342 -0.8558468 -0.8056268
Total cocaine, trafficking, production or distribution 0.0080186 -0.0443540 0.1815189 0.0002429 -0.0576558 -0.0816744 0.0032864 -0.0408692 0.0252626 0.2132940 -0.0442931 0.0848983 -0.1543917 -0.0123439 -0.0202027
Total distribution - Cannabis Act 0.2532874 0.3697878 0.4186719 0.2564310 0.7846734 0.8875198 0.3338212 0.8638276 0.5252365 0.5014931 -0.1813740 0.4965440 -0.3997706 0.6914302 0.2530735
Total drug violations -0.3430541 -0.5827351 -0.2779386 -0.5463706 -0.6396933 -0.5269176 -0.5575712 -0.4479062 -0.3701146 -0.5511335 -0.4286892 0.3378455 -0.2088664 -0.5953362 -0.5170650
Total fail to stop or remain -0.4034206 -0.5030218 -0.2803077 -0.6386394 -0.5070987 -0.4955722 -0.6375927 -0.4195422 -0.3208439 -0.5573177 -0.4472545 0.1577972 0.1608018 -0.6228607 -0.6104423
Total firearms, use of, discharge, pointing 0.4841041 0.7126655 0.3222550 0.7080560 0.7362454 0.6864169 0.7176776 0.5769022 0.5186607 0.6327615 0.5410733 -0.4276664 0.1780762 0.7437977 0.6955827
Total forcible confinement or kidnapping -0.0720063 0.0112625 0.1985305 0.0983675 -0.0452937 -0.1057013 0.0631279 -0.0020276 0.0240264 0.3024561 -0.0950879 0.2446965 -0.2214532 0.0370225 0.0638929
Total impaired driving -0.8557310 -0.7818825 -0.3409620 -0.8188966 -0.6783186 -0.7761839 -0.8456043 -0.8438246 -0.5720862 -0.5889177 -0.8976395 0.6189236 -0.6575261 -0.8275574 -0.8216312
Total importation and exportation - Cannabis Act 0.6767907 0.1838120 0.2046896 0.1116378 0.4052830 0.8176372 0.1478339 0.5465302 0.4199046 0.4234147 -0.3605503 0.7744642 -0.0363785 0.3419987 0.1164391
Total mischief -0.9421800 -0.8471372 -0.3820440 -0.8534749 -0.7563984 -0.8918294 -0.8895470 -0.8566592 -0.6105450 -0.5876269 -0.9529915 0.7451827 -0.7189232 -0.8816163 -0.8709267
Total offences in relation to sexual services 0.1489431 0.0213727 -0.4663334 0.0241671 -0.3243725 -0.1657629 0.0339013 0.5777400 0.0346752 -0.1125754 0.0358541 -0.5972165 0.6151757 -0.1268857 0.0749833
Total other Controlled Drugs and Substances Act drugs, trafficking, production or distribution 0.6495872 0.6406151 0.6388294 0.7081229 0.6977985 0.6213830 0.7108679 0.6006871 0.6900359 0.6663725 0.6454468 -0.3011561 0.1776899 0.7286213 0.7085948
Total other Criminal Code traffic violations -0.3874440 -0.4263617 -0.1928496 -0.5842861 -0.3942984 -0.4435046 -0.5827052 -0.3781473 -0.2337219 -0.4494342 -0.4256197 0.1133769 0.1587207 -0.5513948 -0.5615534
Total other Criminal Code violations -0.4582604 -0.1970822 -0.2703928 -0.3107493 -0.2131728 -0.3098238 -0.3272181 -0.2927299 -0.2581102 -0.1374758 -0.4353647 0.1195234 -0.1448995 -0.3075077 -0.3212000
Total other Federal Statutes -0.5031877 -0.4959433 -0.1536789 -0.4251374 -0.5025090 -0.4955586 -0.4627664 -0.5004846 -0.3190501 -0.2850063 -0.5950219 0.5122293 -0.5006676 -0.4867740 -0.4229529
Total other assaults -0.7774742 -0.8203756 -0.3529050 -0.8408717 -0.7923596 -0.8219512 -0.8592494 -0.7914887 -0.5078342 -0.5963271 -0.8563039 0.6342825 -0.5585913 -0.8675191 -0.8394781
Total other violations -0.6948372 -0.6049911 -0.3362882 -0.7393387 -0.4829557 -0.6199522 -0.7453292 -0.6079443 -0.4412007 -0.5557644 -0.6815783 0.3165638 -0.2428936 -0.6997612 -0.7285573
Total other violations causing death -0.7013575 -0.5561567 -0.4128154 -0.5330036 -0.5441578 -0.6001823 -0.5702280 -0.6020835 -0.5816201 -0.4667281 -0.5995986 0.4220517 -0.5101842 -0.5804457 -0.5177336
Total other violent violations 0.6964391 0.7813993 0.2698997 0.6861032 0.7566179 0.7006098 0.7205615 0.7710238 0.5678581 0.5599879 0.7078508 -0.7407048 0.5774332 0.7512794 0.7146516
Total possession - Cannabis Act NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Total possession of stolen property -0.8196812 -0.7026968 -0.2209310 -0.7205022 -0.6337884 -0.8151877 -0.7595498 -0.6951892 -0.4070943 -0.4134656 -0.8648735 0.7192309 -0.6326056 -0.7496622 -0.7671210
Total production - Cannabis Act -0.7823286 -0.3321634 -0.2575272 -0.4347831 0.8863796 0.8552884 -0.3390936 0.9509705 0.8444271 -0.5354741 0.9226652 -0.9901246 -0.1265695 0.3094884 -0.4511187
Total property crime violations -0.8978577 -0.7543288 -0.4203821 -0.7746335 -0.6737176 -0.8114778 -0.8121705 -0.7934432 -0.5476628 -0.5366632 -0.8962551 0.6683060 -0.6368281 -0.8005713 -0.7949070
Total prostitution -0.8489033 -0.7559321 -0.5186475 -0.9180496 -0.7359056 -0.7157644 -0.9652457 -0.6890291 -0.5180918 -0.5513721 -0.8177643 0.2252633 -0.2280487 -0.9558729 -0.9036327
Total robbery -0.5709575 -0.6007255 -0.2973412 -0.6191589 -0.6429332 -0.6640692 -0.6398530 -0.5423024 -0.3066751 -0.3943991 -0.6745697 0.5084582 -0.3464958 -0.6596610 -0.6230131
Total sale - Cannabis Act 0.8305721 -0.1995422 -0.1936661 -0.2414690 0.0859602 0.7246261 -0.2315929 0.2651273 0.4681703 0.0796278 -0.2133301 0.7179241 0.3914490 -0.0892274 -0.2326984
Total sexual violations against children 0.8340545 0.9631761 0.4861737 0.9038366 0.8848393 0.8979768 0.9343577 0.8850667 0.6683254 0.7698027 0.8622806 -0.7587430 0.6418588 0.9493437 0.9126578
Total theft of motor vehicle -0.8934673 -0.7086737 -0.3710112 -0.6914095 -0.6038020 -0.7685217 -0.7373869 -0.7760287 -0.5825151 -0.4755032 -0.8544672 0.6865498 -0.7347073 -0.7248222 -0.7254220
Total theft over $5,000 (non-motor vehicle) -0.6491545 -0.4225805 -0.3014013 -0.4782235 -0.3342665 -0.4653237 -0.4965087 -0.5338034 -0.3649756 -0.3222718 -0.6086177 0.3643097 -0.4855605 -0.4693896 -0.4890690
Total theft under $5,000 (non-motor vehicle) -0.8094329 -0.7399273 -0.4824113 -0.7709340 -0.7138874 -0.7738704 -0.8009891 -0.7576249 -0.5158623 -0.5842126 -0.8346564 0.5971284 -0.4862269 -0.8023110 -0.7757068
Total trafficking in stolen property 0.8607122 0.8837019 0.3888328 0.8627692 0.8116025 0.9082499 0.9003260 0.8123087 0.5383496 0.6748250 0.9082349 -0.7849822 0.6359333 0.9041812 0.8854195
Total violent Criminal Code violations -0.6680451 -0.4037117 -0.1343821 -0.3982306 -0.3211282 -0.5137263 -0.4445947 -0.5525946 -0.2576917 -0.1123500 -0.6785080 0.5155067 -0.7056360 -0.4259395 -0.4327636
Total weapons violations -0.5488710 -0.2894091 -0.1254973 -0.4402386 -0.1719470 -0.4010723 -0.4336278 -0.3638832 -0.2474038 -0.1334781 -0.5087586 0.1434243 -0.2617335 -0.3789758 -0.4719344
Total, all Criminal Code violations (excluding traffic) -0.8604485 -0.6786744 -0.3832652 -0.7068606 -0.6046292 -0.7527368 -0.7452086 -0.7421601 -0.5006376 -0.4571058 -0.8581022 0.6179758 -0.6235580 -0.7310706 -0.7291286
Total, all Criminal Code violations (including traffic) -0.8634277 -0.6877659 -0.3841524 -0.7209247 -0.6126964 -0.7594233 -0.7584953 -0.7489895 -0.5046405 -0.4700883 -0.8635202 0.6149735 -0.6127094 -0.7434050 -0.7417003
Total, all violations -0.8875784 -0.7308819 -0.4025536 -0.7606958 -0.6587512 -0.7967699 -0.7988669 -0.7818161 -0.5315050 -0.5121003 -0.8942667 0.6414355 -0.6288462 -0.7864432 -0.7790928
Total, possession, other Controlled Drugs and Substances Act drugs 0.8702986 0.9371958 0.5100074 0.9443876 0.9041758 0.8971434 0.9676695 0.8962826 0.7001886 0.8034666 0.9170933 -0.7102608 0.5627222 0.9800383 0.9418946

The table provides a summary of the correlation between all types of violations and income sources in Ontario. Notably, COVID-19 benefits are excluded from the analysis due to their availability only during the pandemic years, which limits the dataset. The strongest correlation is observed between self-employment income and production under the Cannabis Act. However, it’s important to note that this relationship may not be entirely reliable due to the limited data available (3 observations), as illustrated in the plot below.

The second strongest correlation, with a coefficient of 0.98, is observed between total income and incidents of possession of other Controlled Drugs and Substances Act drugs. With additional data available, this relationship appears more promising. Upon examination, it becomes evident that as income levels increase, the number of incidents of possession of these drugs also tends to rise. Despite the negative correlation between total income and total crime rate in Ontario, as found in previous analyses, a positive relationship exists between total income and incidents of possession of other Controlled Drugs and Substances Act drugs.

Group by Income Level

I created 4 levels for average total income using the quarantines, from negative infinity to the first quantile is “Low”, from first quantile to mean is “Med_Low”, from mean to 3rd quantile is “Med_High”, from 3rd quantile above is “High”.

filter_census <- census_df %>% filter(`Income source` == "Total income")
# summary(filter_census$`Average income (excluding zeros)`)
breaks <- c(-Inf, 40460, 45077, 48859, Inf)
filter_census$income_level <- cut(filter_census$`Average income (excluding zeros)`, 
                                  breaks = breaks, labels = c("Low", "Med-Low", "Med-High", "High"))

From the box plot presented below, it appears that the total crime rate does not exhibit a clear trend of decreasing with higher levels of total income. This finding contradicts previous observations when examining the relationship between total crime rate and average total income by province. Therefore, it is conceivable that while a relationship exists, it may be influenced by other factors related to the demographics of each province. Consequently, when considering all observations collectively, the relationship becomes less apparent.

Fitting Statistical Models

Since we’ve previously observed in plots such that the slope of the relationship between average total income and total crime rate is different across provinces, we need to fit a model with interaction terms such that the slopes can be different.

library(broom)
model <- lm(data = joint_df, `Rate per 100,000 population` ~ `Average income (excluding zeros)` * `GEO`)
tidy_coeffs <- tidy(model)
table <- tidy_coeffs %>%
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", full_width = TRUE)
table
term estimate std.error statistic p.value
(Intercept) 14793.6650192 1437.5519874 10.2908731 0.0000000
Average income (excluding zeros) -0.0965352 0.0258743 -3.7309253 0.0002428
GEOBritish Columbia 13870.9677393 2312.3835770 5.9985583 0.0000000
GEOManitoba 7946.1906789 2296.0365821 3.4608293 0.0006468
GEONew Brunswick -2943.0958277 2103.8238576 -1.3989269 0.1632431
GEONewfoundland and Labrador -8091.9410527 1705.7594792 -4.7438934 0.0000038
GEONova Scotia 5230.7520879 2193.8216748 2.3843105 0.0179604
GEOOntario 6850.6041303 3311.9257794 2.0684655 0.0397653
GEOPrince Edward Island 1789.8353436 2029.0026916 0.8821257 0.3786716
GEOQuebec 1869.9125139 2382.7985390 0.7847548 0.4334416
GEOSaskatchewan 7458.4753196 1887.5177478 3.9514729 0.0001047
Average income (excluding zeros):GEOBritish Columbia -0.2946092 0.0467722 -6.2988076 0.0000000
Average income (excluding zeros):GEOManitoba -0.1693983 0.0477122 -3.5504187 0.0004703
Average income (excluding zeros):GEONew Brunswick -0.0316576 0.0458147 -0.6909929 0.4902984
Average income (excluding zeros):GEONewfoundland and Labrador 0.0963984 0.0338286 2.8496116 0.0047929
Average income (excluding zeros):GEONova Scotia -0.2025528 0.0469679 -4.3125776 0.0000244
Average income (excluding zeros):GEOOntario -0.2319253 0.0654035 -3.5460653 0.0004777
Average income (excluding zeros):GEOPrince Edward Island -0.1471245 0.0441284 -3.3340081 0.0010044
Average income (excluding zeros):GEOQuebec -0.1635189 0.0506113 -3.2308755 0.0014233
Average income (excluding zeros):GEOSaskatchewan -0.0726509 0.0362467 -2.0043456 0.0462590

The linear regression model examines the relationship between Crime Rate per 100,000 population and Average income (excluding zeros) while considering the categorical variable GEO representing different provinces. The model reveals several significant findings: firstly, a negative relationship exists between average total income and total crime rate, suggesting that higher average income tends to be associated with lower crime rates. Secondly, various provinces exhibit differing baseline rates, with British Columbia notably displaying a significantly higher rate compared to the reference province. Additionally, interaction terms between income and provinces indicate varying effects across regions, such as a stronger negative association between income and crime rate in British Columbia. Overall, the model, with an adjusted R-squared value of 0.9328, indicates a robust fit, suggesting that both income and province significantly influence the rate per 100,000 population, with nuanced variations across different regions.

Summary

What you found so far from your data in terms of the formulated question.

Recall that the question of interest is: “How does crime rate relate to poverty in Canada?”

  1. Looking at Crime Data:
  1. Looking at Income Data:
  1. Examining Crime and Income Together:
  1. Zoom on Province - Ontario:
  1. Group by Income Level
  1. Fitting Statistical Models